COMPARISON OF IMAGE SEGMENTATION METHOD IN IMAGE CHARACTER EXTRACTION PREPROCESSING USING OPTICAL CHARACTER RECOGINITON

نویسندگان

چکیده

Today, there are many documents in the form of digital images obtained from various sources which must be able to processed by a computer automatically. One document image processing is text feature extraction using OCR (Optical Character Recognition) technology. However, cases technology unable read characters accurately. This could due several factor such as poor quality or noise. In order get accurate result, good quality, so that need preprocessed. The preprocessing method used this study Otsu Thressholding Binarization, Niblack, and Sauvola methods. While extract character Tesseract library Python. test results show direct original gives better with match rate average 77.27%. Meanwhile, was 70.27%, 69.67%, Niblack only 35.72%. some research methods give results.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Image preprocessing for optical character recognition using neural networks

Primary task of this master’s thesis is to create a theoretical and practical basis of preprocessing of printed text for optical character recognition using forward-feed neural networks. Demonstration application was created and its parameters were set according to results of realized experiments. Project definition and task determination 1. Write a introduction about the problematics of optica...

متن کامل

Image Thresholding for Optical Character Recognition and Other Applications Requiring Character Image Extraction

Two new, cost-effective thresholding algorithms for use in extracting binary images of characters from machineor hand-printed documents are described. The creation of a binary representation from an analog image requires such algorithms to determine whether a point is converted into a binary one because it falls within a character stroke or a binary zero because it does not. This thresholding i...

متن کامل

Stroke Extraction from Gray-Scale Character Image

In this paper, a topographic feature classification method based on 4directional scanning, and a stroke extraction method from skeletal pixels are proposed. Combination of the proposed methods is relatively fast and the resulting strokes are more acceptable.

متن کامل

Optical Character Recognition from Text Image

Optical Character Recognition (OCR) is a system that provides a full alphanumeric recognition of printed or handwritten characters by simply scanning the text image. OCR system interprets the printed or handwritten characters image and converts it into corresponding editable text document. The text image is divided into regions by isolating each line, then individual characters with spaces. Aft...

متن کامل

Image Normalization and Preprocessing for Gujarati Character Recognition

Pattern recognition has been an important area in computer vision applications. In the case of a planar image, there are four basic forms of geometric distortion caused by the change in camera location: translation, rotation, scaling and skew. So far, a number of methods have been developed to solve these distortions, such as moment invariants’, Fourier descriptor, Hough transformation, shape m...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Jurnal Teknik Informatika

سال: 2023

ISSN: ['1979-9160', '2549-7901']

DOI: https://doi.org/10.52436/1.jutif.2023.4.3.956